Integer time series models for tuberculosis in Africa

Tuberculosis, an airborne disease, is the deadliest human infectious disease caused by one single agent. The African region is among the most affected and most burdensome area in terms of tuberculosis cases. In this paper, we modeled the number of new cases of tuberculosis for 2000–2021 by integer time series. For each African country, we fitted twenty different models and selected the model that best fitted the data. The twenty models were mostly based on the number of new cases following either the Poisson or negative binomial distribution with the rate parameter allowed to vary linearly or quadratically with respect to year. The best fitted models were used to give predictions for 2022–2031.

www.nature.com/scientificreports/ dynamics of the disease as interest here is in the count of the number of people infected with the disease. Therefore, we use more of a statistical method. Our modeling approach does not involve simulations, but instead fits a curve to the historical data. This way of modeling simply reflects the observed relationship between variables and attempts to forecast future results based on the past data (Dhlakama et al. 8 ).
There has been research concerning the modeling of TB. Liu et al. 9 developed a TB model incorporating seasonality based on data from China. They developed a compartmental model to describe TB seasonal incidence rate by incorporating periodic coefficients and discovered that there is a seasonal pattern of new TB cases, with those numbers peaking in late spring to early summer. Their research attributed the seasonal pattern to the Chinese Spring Festival, and/or to the more frequent viral infections like flu, which can lead to reactivation of the Mtb.
Using 462,214 pulmonary TB cases over a period of 10 years as training data, Liu et al. 10 used two different approaches to predict the number of TB cases in Jiangsu Province, China. The first approach, autoregressive integrated moving average (ARIMA), is a statistical method that uses past data to predict future trends. The second approach, back-propagation neural network (BPNN), is a neural network method that uses statistical machine learning to learn from data. Both approaches were able to predict seasonality and trend of pulmonary TB in the Chinese population, but the BPNN approach was slightly more accurate.
Using multilevel modelling, Dhlakama et al. 8 investigated factors that influence self-reported TB cases from 2008 to 2017. They discovered several variables that had a significant impact on TB such as marital status, gender, race, unemployment, other diseases, exercise patterns, smoking patterns, health consultation, asthma diagnosis, diabetes diagnosis, housing quality and household. The results were the same using both frequentist and Bayesian models even with informative priors.
In this paper, we model the prevalence of TB in Africa as calculated by incidence rate per 100,000 people. We model the distribution of the number of new cases of TB in a given year conditioned on the history of the number of cases up to that year. For each country, we fitted the incidence data using 20 models for integer time series and selected the best fitting model. Most of the twenty models were based on the number of new cases following either the Poisson or negative binomial distribution with the rate parameter allowed to vary linearly or quadratically with respect to year. Others were based on the number of new cases following either the Poisson or negative binomial distribution with the rate parameter allowed to depend on previous number of new cases. There are no papers modeling incidence rates of TB for all African countries.
The rest of the paper is structured as follows. "Data" section discusses the data, "Models" section discusses the models used to fit the data, "Results and discussion discusses the results and the paper is concluded in "Conclusion" section.

Data
According to the World Bank, TB incidence is the rate per 100,000 population of new and relapse TB cases that occur in a year. This number covers all types of TB such as other members of the Mtb complex and also cases in people living with HIV (World Bank 11

Models
Let Z t denote the number of new cases of TB reported in year t, t = 2000, 2001, . . . , 2021 . Let F t denote the history of the number of new cases up to and including year t. For each of the fifty two countries, the following models were fitted  www.nature.com/scientificreports/ , referred to as the identity Poisson model; , referred to as the log Poisson model; , referred to as the identity negative binomial model; , φ , referred to as the log negative binomial model; , referred to as the identity Poisson model regressed on the previous observation; , referred to as the log Poisson model regressed on the previous observation; , referred to as the identity negative binomial model regressed on the previous observation; , φ , referred to as the log negative binomial model regressed on the previous observation; , referred to as the identity Poisson model regressed on the two previous observations; , referred to as the log Poisson model regressed on the two previous observations; , referred to as the identity negative binomial model regressed on the two previous observations; , φ , referred to as the log negative binomial model regressed on the two previous observations; referred to as the identity Poisson model regressed linearly with respect to year; referred to as the log Poisson model regressed linearly with respect to year; • Z t | F t−1 ∼ Negative Binomial (β 0 + β 1 t, φ) , referred to as the identity negative binomial model regressed linearly with respect to year;  www.nature.com/scientificreports/ • Z t | F t−1 ∼ Negative Binomial exp (β 0 + β 1 t), φ , referred to as the log negative binomial model regressed linearly with respect to year; referred to as the identity Poisson model regressed quadratically with respect to year; • Z t | F t−1 ∼ Poisson exp β 0 + β 1 t + β 2 t 2 , referred to as the log Poisson model regressed quadratically with respect to year; • Z t | F t−1 ∼ Negative Binomial β 0 + β 1 t + β 2 t 2 , φ , referred to as the identity negative binomial model regressed quadratically with respect to year; • Z t | F t−1 ∼ Negative Binomial exp β 0 + β 1 t + β 2 t 2 , φ , referred to as the log negative binomial model regressed quadratically with respect to year, where β 0 , β 1 and β 2 are the regression coefficients. These models are due to Fokianos et al. 12 , Fokianos and Tjostheim 13 , Fokianos and Fried 14 and Christou and Fokianos 15,16 . The models due to Christou and Fokianos 15 are based on the negative binomial distribution. The models due to the others are based on the Poisson distribution. Both Poisson and negative binomial distributions are commonly implemented when dealing with count data. The stated models were fitted by the method of maximum likelihood. That is, by maximizing www.nature.com/scientificreports/ and respectively, with respect to β 0 , β 1 , β 2 and φ . We shall denote their maximum likelihood estimates by β 0 , β 1 , β 2 and φ , respectively. The likelihood functions were maximized using the command tsglm in the R package tscount (Liboschik et al. 17 Table 1 when the best fitted model was the identity Poisson model. The parameter estimates, 95% confidence intervals and values of the AIC, BIC and p-values of the Kolmogorov-Smirnov test are given in Table 2 when the best fitted model was the log Poisson model. The parameter estimates, 95% confidence intervals and values of the AIC, BIC and p-values of the Kolmogorov-Smirnov test are given in Table 3 when the best fitted model was the log Poisson model regressed linearly versus year. The parameter estimates, 95 percent confidence intervals and values of the AIC, BIC and p-values of the Kolmogorov-Smirnov test are given in Table 4 when the best fitted model was the log Poisson model regressed on the previous observation. The parameter estimates, 95 percent confidence intervals and values of the AIC, BIC and p-values of the Kolmogorov-Smirnov test are given in Table 5 when the best fitted model was the identity Poisson model regressed on the previous observation. The parameter estimates, 95% confidence intervals and values of the AIC, BIC and p-values of the Kolmogorov-Smirnov test are given in Table 6 when the best fitted model was the log negative binomial model regressed on the two previous observations. The parameter estimates, 95 percent confidence intervals and values of the AIC, BIC and p-values of the Kolmogorov-Smirnov test are given in Table 7 when the best fitted model was the log negative binomial model regressed linearly versus year. The parameter estimates, 95 percent confidence intervals and values of the AIC, BIC and p-values of the Kolmogorov-Smirnov test are given in Table 8 when the best fitted model was the log negative binomial model regressed on the previous observation.

Results and discussion
Models based on the negative binomial distribution gave the better fit for Sao Tome and Principe, Eritrea, Seychelles, Tanzania, Cabo Verde, Djibouti, Eswatini, Kenya, Lesotho, Malawi, Namibia, South Africa and Zimbabwe. Models based on Poisson distribution gave the better fit for Chad, Comoros, the Democratic Republic of      www.nature.com/scientificreports/  www.nature.com/scientificreports/ and Botswana's decline has been especially sharp since 2005, and despite a slight rise during the COVID-19 pandemic, it is expected to continue dropping until 2030, though at a slower pace. Burkina Faso and Benin are projected to maintain their historical rates of decline, while Burundi is likely to accelerate its reduction of cases. Angola, on the other hand, had a rapid increase in infections from 2000 to 2010, followed by a decrease until 2020. The model predicts a modest rise in cases for Angola in the next decade. The Central African Republic has shown no significant change in its infection rate for the past 20 years, and the model does not anticipate any changes by 2030. Figure 4 shows that Cote d'Ivoire and Cameroon are projected to have rising infection rates, which is in line with their historical trends over the past 18 years. The Democratic Republic of Congo and Comoros, however, are expected to have stable infection rates in the future (DRC rates remain constant after a sudden jump). The former had a nearly constant number of cases for the first 11 years, followed by a decade of decline. The latter had a decreasing trend for the first decade, then a non-decreasing trend for the next five years, and then a constant rate. The Republic of Congo, which had been declining for the last 10 years, is forecasted to have a slight increase in cases. Cabo Verde, which had fluctuating infection rates for the first 13 years and then non-increasing rates for the next nine years, is predicted to have an increasing trend until at least 2030.
According to Fig. 5, the model forecasts a decrease in cases for the next decade or so, based on the consistent decline over the years for both Ethiopia and Egypt. Eritrea, which has a less consistent historical trend than Ethiopia and Egypt, is also projected to have a reduction in incidences. On the other hand, Djibouti, Algeria, and Gabon are expected to have an increase in cases, despite having some periods of decline in the past. This is an interesting observation that warrants further investigation. Figure 6 shows that Ghana has shown a consistent decrease in its infection rates over the years, and the model anticipates a continued decline. Guinea, which had a decrease from 2000 to 2010 and then a stable rate until 2021, is also forecasted to have a decline in cases. Gambia, which had a varying rate of decrease from 2005 to 2021, is expected to have a lower rate of decrease starting from a higher level than the last historical point. Guinea Bissau, which had an increase until 2005 and then a constant rate until 2021, is projected to have a slight dip and then a stable rate. Equatorial Guinea, which had mostly increasing rates with some outliers, is predicted to have a further increase in cases. Kenya, which had a decreasing trend for the last 15 years, is forecasted to reverse its trend and have an increasing rate until 2030. www.nature.com/scientificreports/ According to Fig. 7, Liberia had an increasing trend for 13 years until it stabilized at around 308 cases per 100,000 people. The model predicts a further increase with high confidence. Lesotho had a quadratic increase until 2007, followed by a period of stagnation and then a decrease, possibly due to interventions for drug-resistant TB (Satti et al. 22 ). The model forecasts a reversal of the decreasing trend after 2022. Morocco had fluctuating rates over the past two decades, but the model expects a rise and then a stabilization at around 100 cases. Mozambique had a constant rate of 361 cases, with wide confidence intervals, and the model does not anticipate any change. Mali and Madagascar are expected to decrease their incidence rates with relatively high confidence.
According to Fig. 8, Mauritania and Niger had decreasing trends from 2000 to 2021, possibly due to the World Health Organization's End TB strategy launched in 2016 and 2019, respectively (Aw et al. 23 ). The model predicts similar rates of decline for both countries. Mauritius had a fluctuating trend, with peaks and troughs every two years. The model forecasts a stable rate, with a constant mean but a piecewise constant median. Malawi and Namibia had similar patterns, with a decrease in cases around 2005 and then a stabilization. The model forecasts increasing trends for both countries from 2022 to 2030. Nigeria had a stable rate from 2000 to 2021, and the model expects no change in the future. The observed and forecasted patterns are consistent for Mauritius and Nigeria. Nigeria is predicted to have a constant incidence of TB at around 220 per 100,000 people.
According to Fig. 9, Rwanda had a fluctuating but generally increasing trend until 2008, when it started to decline. The model predicts significant increases in the future. Sudan and Senegal had decreasing trends and are expected to continue this decline. Sierra Leone and Somalia had decreasing trends in the last 10 years, but are forecasted to have increasing rates in the next decade. Sao Tome and Principe had a cyclical pattern of increases and decreases, and is projected to have a slight increase and then a stabilization until 2030.
According to Fig. 10, Eswatini had an increasing trend for the first 10 years, followed by a 15-year period of decline, except for two years with a significant rise. The model forecasts modest increases from 2022 to 2030. Seychelles had fluctuating rates but is expected to have a decline in rates. Togo had a concave-down curve, with a peak around 2005 and then a decline. The model predicts a rise in cases in the future. Chad had a decreasing trend throughout the dataset, and the model expects a continued decline. Tunisia had a relatively stable rate, and the model does not anticipate any change. Tanzania had a decreasing trend, and the model projects a further decrease until 2030.
According to Fig. 11, Uganda had a stable rate of infection for about 10 years, but the model forecasts a sharp decline from 2022 to 2030. South Africa had a very high rate of TB until 2010, when it started to decline steadily. The model predicts a slightly increasing, stable rate for the next 8 years. Zambia had a rapid decline in infection rates from 2000 to 2021, and the model expects a continued decrease in the future. Zimbabwe had a decreasing trend since 2004, but the model forecasts an increase in cases in the next decade.
Angola (slightly), Algeria, Democratic Republic of Congo, Cameroon, Cabo Verde, Djibouti, Equatorial Guinea, Eswatini, Gabon, Kenya, Malawi, Namibia, Rwanda, Sao Tome and Principe, Sierra Leone, Somalia, Togo, Zimbabwe are expected to experience increases in TB infection rates. TB cases increasing in Equatorial Guinea could be due to lack of important knowledge about TB and bad attitudes of caregivers, making TB one of the major causes of morbidity and mortality in Equatorial Guinea (Vericat-Ferrer et al. 24 ). According to Kitimo 25 , a representative of the Intergovernmental Authority on Development Mission to Kenya was quoted as saying that mixed migration flows and lack of government data sharing mechanism had resulted in ineffective anti-TB campaigns in the region. Cross-border movement is viewed as a factor in the TB infection rates in Kenya and a significant barrier to the fight against the disease. It is however important to note that Ethiopia, a neighboring country, is expected to experience a decrease in incidences.
Southern Africa accounts for a third of the world's countries with highest TB burdens. It is also a hotspot for TB/HIV co-infections with Mozambique, Malawi, Lesotho and Zambia being significant actors (World Bank 26 ).
To help mitigate Southern Africa's complex TB and HIV epidemics and the closely related occupational lung diseases, the World Bank Board of Directors, on 19 June 2020, approved $56 million additional financing from the International Development Association (IDA). This new financing brought the total World Bank financing for the Southern Africa Tuberculosis and Health Systems Support Project (SATBHSSP) to $178 million, covering Lesotho, Malawi, Mozambique and Zambia with the goal of enhancing TB case detection and treatment (World Bank 26 ). Our projections, however, predict a rise in TB infections for all these countries except Zambia, where we forecast a steady decrease. This may not necessarily be a bad thing if we find that the injection of funds led to higher detection rates.
In recent years, Southern Africa began turning the tide against TB (World Health Organization 3,5 ). Six countries in Southern Africa achieved reductions of 4-10% per year in TB incidence following a peak in the HIV epidemic. These countries are Botswana, Eswatini, Lesotho, Namibia, South Africa and Zimbabwe. As mentioned earlier, models show that Eswatini, Malawi, Namibia, Lethoso, South Africa, and Zimbabwe are predicting increases in TB cases. Mozambique is virtually constant and Botswana and Madagascar are being projected to decrease.
We note that incidences of TB in North Africa seem much less than in other regions. Algeria, Egypt, Morocco, Sudan, and Tunisia had 54, 10, 94, 58, and 36 cases per 100,000 people, respectively. Reasons for such relatively less incidence rates may include the fact that North African countries tend to be more economically developed than other African countries translating into more resources to invest in including TB control and other public health measures. Superior access to health care helps with diagnosis and treatment and lower HIV prevalence-a major risk factor-contributes to lower rates of TB. Our models forecast a slight increase followed by a decrease in Sudan, gradual and consistent decrease in Mauritania, a somewhat sharp increase immediately followed by a period of stagnation in Morocco, a concave increase in Algeria, a small increase followed by years of decreasing incidence rates in Egypt, and virtually no change in Tunisia. Eltayeb et al. 27  www.nature.com/scientificreports/ in the Middle East and North Africa regions, TB awareness and interventions targeting the elderly and those from lower-in-come settings, particularly directed at gender differences, are essential. One limitation of our study could be the quality of our data. It is also worth noting that many of the countries have suboptimal case detection rate percentages. This means that the numbers obtained by the World Bank are almost certainly underestimating the true incidence rates. Another important limitation is that the models in some cases seem sensitive to the result of upticks in incidences following the COVID-19 pandemic.

Conclusions
We have modeled the new cases of TB reported in 2000, 2001, . . . , 2021 for fifty two African countries. We used integer time series models due to Fokianos et al. 12  Based on the best fitted models, we were able to obtain point estimates as well as prediction intervals of incidence rates for years starting in 2022 till 2031. Except in rare instances, the prediction intervals were rather wide. An epidemiological model could have provided less variable conditions (although this would be based on many assumptions made).
Assuming the forecasts are accurate, the model predicts how the infection rates may change or stay the same without interventions. Countries with rising or stable rates should act more forcefully to reduce infections. Countries with falling rates should not relax their efforts, but keep following their current strategies.
The results show that the fear of certain researchers about the attainability of the SDG goals are founded. The forecasting shows that TB incidences will not be reduced by 90% unless decisive action is taken by policy makers and health care practitioners. Tunisia, Niger, Mauritania, and Egypt are the only countries who may have a realistic chance of eliminating TB.
According to Silva et al. 6 , recommendations to combat TB could include: augmenting domestic financial resources, forging mutually beneficial alliances with the private sector, securing more international financial assistance, establishing venues for interaction, emulating efficacious practices at national and regional levels (African Union 28 ), and expeditiously developing and adopting novel tools, interventions, and strategies.
To fight TB, governments, organizations, and influential people need to join forces with the same alacrity, political commitment, and resolve displayed during the COVID-19 pandemic. The improved ability to test and sequence genes that was developed to fight COVID-19 can help find TB cases better, a key step in stopping the disease from spreading and helping those who have it survive (World Health Organization 29 ).
Finally, a new vaccine would help to drastically reduce the incidence of TB. In countries where TB is common, Bacillle Calmette-Guerin (BCG)-the only licensed TB vaccine in the world-is effective in protecting against TB meningitis and disseminated TB. However, it has not been very effective for teens and adults as well as those with pulmonary TB, particularly in developing countries around the world. A new vaccine is needed and should be efficacious in protecting against developing TB thus reducing spread and also leading to drastic reductions in mortality. According to Davenne and McShane 30 , one idea that may lead to better immunization is the use aerosol vaccines. Such vaccines reproduce the natural route of infection of Mtb, as they directly target alveolar macrophages.

Data availability
The data can be obtained from the corresponding author.

Code availability
The codes can be obtained from the corresponding author.